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Abstract. This paper addresses the uniform random generation of words from a context-free 
language (over an alphabet of size k), while constraining every letter to a targeted frequency of 
\ occurrence. Our approach consists in a multidimensional extension of Boltzmann samplers [7]. 

We show that, under mostly strong- connectivity hypotheses, our samplers return a word of size 
r*j ' in [(1 — e)n, (1 + e)n] and exact frequency in 0(n 1+h ^ 2 ) expected time. 

^ , Moreover, if we accept tolerance intervals of width in Q(y/n) for the number of occurrences of 

each letters, our samplers perform an approximate-size generation of words in expected 0(n) 
time. We illustrate these techniques on the generation of Tetris tessellations with uniform statis- 
tics in the different types of tetraminoes. 

Q ■ 1 Introduction 

i Random generation is the core of the simulation of complex data. It appears in real applicative 

domains such as complex networks (biology, Internet or social relationship) , or software testing (vali- 
dation, benchmarking). It helps us to predict the behavior of algorithms (complexities and statistical 
significance of results), to visualize limit properties (such as transition phases in statistical physics), 
to model real contexts (random graphs for web simulation). 
, Following the pioneering work of Flajolet et al [10], decomposable combinatorial classes can be 

specified using standard specifications. Two major techniques can then be applied to draw m objects 
, of size n at random from such a class. On one hand, the recursive approach [14] precomputes the 

cardinalities of sub-classes for sizes up to n and uses these numbers to perform local choices that are 
consistent with the targeted uniformity. The best known optimization of this technique [5] uses certified 
floating point arithmetics and works in 0(m-n 1+0 ^) but its implementation remains highly non-trivial 
due to its sophisticated precomputations. On the other hand, the Boltzmann sampling techniques, 
recently introduced by Duchon et al [7], achieves a random generation for most unlabelled [8] and 
labelled specifications in 0(m ■ n 2 ) operations at an optimally low 0(m ■ n) memory cost. Instead of 
enforcing a strict - and costly - control on the size of generated objects, this general technique rather 
induces an appropriate distribution on the size of sampled objects, and performs rejection until a 
suitable object is found. 

In the present work, we investigate a natural multivariate extension of Boltzmann sampling, aiming 
at drawing objects, uniformly at random, having a prescribed composition in the different terminal 
letters. From a combinatorial perspective, such a generation allows the so-called symbolic method 
to reclaim combinatorial classes and languages that fall slightly off of its natural expressivity. For 
instance, restrictions of rational languages may not admit a rational (or even context-free) specification 
under the additional hypothesis that some letters co-occur strictly (One may consider the triple-copy 
language). For context-free languages on k letters, this problem was previously addressed within the 
recursive framework [14] by Denise et al [5], deriving algorithms in 0(n k ) and 0(n 2k ) arithmetic 
operations, respectively for rational and context-free languages. Using properties of holonomic series, 
Bertoni et al [3] revisited the problem and proposed a method for the uniform sampling from rational 
languages on two letters in 0{n). Unfortunately a direct generalization of the technique yields an 
algorithm in 0(n k ^ 1 ) for k letters, as pointed out in Radicioni's thesis [13]. 

Following the general philosophy of Boltzmann sampling, our algorithm will first relax the composi- 
tional constraint, using non-uniform samplers to draw objects whose average composition is fine-tuned 
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Table 1. Average-case complexities of our samplers for a word of length n over k letters in strongly connected 
context-free languages under different tolerances. 



to match the targeted one, and perform rejection until an acceptable object is found. By acceptable, 
one understands that generated objects must feature prescribed size and composition, while tolerances 
may be allowed on both requirements. Our programme can then be summarized in the three following 
phases: 

Phase I. Figure out a set of weights such that the expected composition matches the targeted one. 
Phase II. Draw structures from a weighted distribution, using either the recursive approach (See [5]) 

or a weighted Boltzmann sampler (See Section 4). 
Phase III. Reject structures of unsuitable compositions, until an adequate object is generated and 
returned. 

Although phases II and III arc independently addressed in our analyses, one can (and will) combine 
them into a single rejection step when a weighted Boltzmann sampler is used for Phase II. The 
algorithmic aspects of our programme will essentially build on and extend previous works addressing 
the uniform version, but a general analysis of its overall performance is more challenging. Indeed, the 
complexity of the rejection Phase III is heavily related to a general analysis of the limiting distribution 
of the associated multivariate - parameter-induced - generating functions. For each phase, we attempt 
to give mathematical characterizations of classes having proper behaviors. In particular, for context 
free languages whose grammars are strongly connected and aperiodic, we obtain for each combination 
of tolerances, the complexities summarized in Table 1. 

The plan of this paper follows the different phases : Section 2 defines the concepts and notations 
used throughout the paper. Section 3 explains how to tune efficiently the parameters such that the 
targeted composition matches the average behavior (Phase I). In Section 4, we discuss the complexity 
of Phase II, the number of rejections needed to reach a word of suitable size (or suitable approximate 
size). The complexity of the multidimensional rejection (Phase III) is addressed in Section 5. We 
illustrate our method in Section 6 by sampling perfect Tetris tessellations - tessellations of a w x h 
rectangles using balanced lists of tetraminoes. Finally we conclude with a short overview of future 
works. 

2 Notations and definitions 

Following traditional mathematical notations, we will use bold symbols for multi-dimensional vari- 
ables/functions (i.e. x), and use subscripts to access a specific dimension (i.e. Xi). Throughout the 
rest of the document, we will denote by £ the alphabet of k letters, by C a context-free language over 
S, and by n the length of generated words. 

Composition and tolerance. Define the composition of sampled words as the frequency of occur- 
rences of each letter U in a word w G C, denoted by p(w) := (|Hti/ n )ie[i k] • Our mam goal is to 
generate - uniformly at random - some word w G C having a composition that is close to a targeted 
composition f G [0, l] fe such that k] fi = 1- 

Wc make this notion of proximity explicit, and formalize the notion of acceptability for a sampled 
word. Namely let e be a /c-tuple of positive real numbers and a £ Q + a rational exponent, an object 
w£C qualifies as (e, a) -acceptable if and only if 



p(w)i G I(fi,ei,a), for all i G [l,fc] 



Epsilon C 


= 1 Or (2) 


= 1 


rc n (x) 


= £ 






Letters C 


= U C^{z) 


= TT ti Z 


rev (2) 


= u 






Union C 


= A + BCt<{z) 


= A„(z) + B„{z) 


rev (2) 


= Bern ^ 


A„{x) B„(x)\ 


->TA*(x) | TB v (x) 


Product C 


= AxB C„(z) 


= A„(z) x B„(z) 


rev (a) 


= TA*(x 


)TBn(x) 





Fig. 1. Weighted generating functions and associated Boltzmann sampler rC-n^a;) for context-free languages. 

where I(f,e,a) := [/ — f a n a ~ 1 e,f + f a n a ~ 1 e\. This definition captures the case of fixed (exact) 
compositions by setting a — 1 and e, = 1/n, for all i € [1, A;]. 

Weighted distributions. The following notions and definitions, recalled here for the sake of self- 
containment, can be found in Dcnise et al [5]. A positive weight vector w assigns positive weights 
Tfi € R + to each letter ti <G U. The weight is then extended multiplicatively on any object w by 
7t(w) = rixeu) 71 "^- This gives rise to the notion of weighted generating function C- K {z) for a context- 
free language C, a natural generalization of the size (enumerative) generating function where each 
structure is counted with multiplicity equal to its weight 

c w (z) = e »t(uo* h = e < w ■ ■ ■ ^ ltkzH = e c ^z- 

wee tuec n>o 

where c^. n is the total weight 3 of objects of size n. Notice that this generating function can be re- 
interpreted as a multivariate generating function in 7r and z 

This weighting scheme implicitly defines a weighted distribution on the set C n of words of size n, 
such that 

ir(w) it(w) 



¥(w I n = \w\) 



^ 7T 



Finally, the weighted distribution generalizes to a Boltzmann weighted distribution on the 

whole language such that 

™ / ix 7r(w)x n Tr(w)x n ,„. 

Property 1. Let TV (resp. TV,) be the random variable associated with the size (resp. number of occur- 
rences of a letter ti) of a word in a (x, tv) — Boltzmann weighted distribution over a class C. Then the 
expectations of N and Ni are related to the partial derivatives of the multivariate generating function 
C„(z) through 

dC„(x) TC- — C (x) 

E x ^(N)=x-^f- and E^iV,) = ^ , V ( 2 ) 
In the sequel we will denote by fj,(x,ir) the vector of expectations (E X:7V (N 1 ), ■ ■ ■ ,E x ^(Nk)). 



3 Tuning weights (Phase I) 

First, let us address the question of finding a vector 7r such that the multidimensional rejection scheme 
(Phase III) is as efficient as possible. We propose and explore two alternatives, both computing a 
weights vector that make the expected and targeted compositions coincide. The first one uses a 
numerical Newton iteration. The second one uses an asymptotic approximation for the value of z 
which greatly simplifies the weights/frequencies relationship. 



3 This quantity is essentially similar to the partition function in statistical mechanics, introduced by L. 
Boltzmann. 



Input: Initial parameters zq and 7To, a composition /, a size n and e a numerical precision 
Output: The valid weights 

Let E zo be the map from the space of the weights into R+ such that E 20 (7r) = |/(«o,7r); 
Let J(E Z0 (7r)) be the Jacobian matrix of E Z[) evaluated at tt; 
7T := 7To; 
repeat 

end:=true; c := nf; N := ] c — E z „ (7r)|| ; 
while N > e do 

f alia; ■ — TT, 

tt := J(E Z0 )- 1 ( 7 r) • (71/ - E zo (tt)) + tt; 
if iV < ||c-E Z0 (7r)|| then 

7r := 7r aua; ; c := (c + E zo (7r))/2; end:=false; 
end 
end 
until end=true; 
return 7r 



Algorithm 1: Tracking the weights. 

Tuning by expectation. Newton's methods are based on successive linear (or higher order) approxi- 
mations in order to obtain numerical estimates of a root of a system of equations. It is generally an 
efficient algorithm assuming that the initial values are close enough to a root. Here, we are interested 
in finding the unique root (zo,7Tf) of the system /jl(zo,tt) = nf. Algorithm 1 is a slightly revisited 
version of Newton's method which tests at each step if Newton's approximation has improved the 
estimate of the root. This test fails if and only if the current parameters are too far from the solution. 
In this case, we search using dichotomy an intermediate target that is closer to the solution than the 
current parameters. 

Proposition 1. Let f and n be the targeted composition and size respectively. Assume that the Ja- 
cobian matrix J(K Zo (n f)) is not singular^, then Algorithm 1 returns (zo,7Ti) such that the expected 
composition fi(zo,TTi) satisfies ||/i(zo,7ri) — n f\\ < £■ 

Moreover, there exists a neighborhood B of (zo,7Tf) such that, for any ttq 6 B, Algorithm 1 with 
initial condition ttq quadratically converges to nf (i.e. 3C > 1 such that Vfc > 0, — ~^f\\ < C~ 2k 
where 7T k+ i := J(E^ ) _1 (7r fe ) ■ (nf - E Zo (-K k )) + ir k ). 

Asymptotic tuning. Since one generally attempts to generate large objects, a natural option consists 
in solving the simpler asymptotic system. 

Proposition 2. Let us consider a language whose grammar is irreducible and aperiodic and whose 
generating function C 7r (z) admits p(tt) as dominant singularity. Then, for any letter t and as z tends 
to p(Tf), it holds that: 

E Zj7r (A r t ) ~ \ K t n ~ ^ - P p - if / 3 ( 7r ) is a rational singularity, 

Er i7r (-ZVt) ~ —ntn— ^ - if p(n) is an algebraic singularity. 

Remark 1. Considering the expectation ~R n (N t ) of the number of letters t in a word of fixed size 
n. Then, from [5], similar asymptotic estimates holds for K n (N t ) and the weights computed by our 
methods can therefore be used by the recursive approach. 

4 Efficiency of the size rejection scheme (Phase II) 

At this point, we assume that a fc-tuple of weights tt has been found such that the average composition 
in the weighted distribution matches the targeted one. We now need to perform a random generation 
of m words from the context-free language with respect to the 7r-weightcd distribution. 

4 I.e. there is no linear dependency between the expected numbers of different letters. 



Input: Parameters x,tt 

Output: Object of A of size in I(n,e) := [n(l — e),n(l + e)] 
repeat 
| y:=TA^x) 
until I7I G 7(n, e); 
return (7) 



Algorithm 2: Rejection algorithm r2A(x, n; n, e) 



This problem was previously addressed in [5] within the framework of the recursive method, and 
an algorithm in 0(m ■ n) arithmetic operations was proposed. Despite its apparent low complexity, 
the exponential growth of the numbers processed by the algorithm increases the practical complexity 
to (9(m • n 2 ) in time and (9(n 2 ) in memory. 

Let us investigate a weighted generalization of Boltzmann sampling. First let us remind that 
Boltzmann sampling first relaxes the size constraint and draws objects in a Boltzmann distribution 
of parameter x. To that purpose a fixed stochastic process, coupled with an (anticipated) rejection 
procedure, is used (See Algorithm 2). The probabilities of the different alternatives are precomputed 
by an external procedure called oracle (Symbolic algebra, or numerical method in [12]). A judicious 
choice of value for x ensures a low probability of rejection and this approach yields, for large classes 
of structures (trees, sequences, runs, mappings, fountains. . . ), generic algorithms in 0(n 2 ) for objects 
of exact-size n, and in 0{n) for objects of approximate- sizes in [n(l — e),n(l +e)], for some e > 0. 

Through a minor modification of the oracle, one can easily turn unlabclled Boltzmann samplers, 
introduced in [8], into generators for the weighted Boltzmann distribution (Sec Equation 1). Namely, 
one only needs to replace any occurrence of the generating function C(z) by its weighted counterpart 
Ck{z), obtaining generic samplers summarized in Figure 1, and use the classic size rejection process 
(Algorithm 2). 

Proposition 3. Let tt be a k-tuple of weights, x be a Boltzmann parameter, C be a context-free 
specification and C- K {z) its weighted generating function. 

Then the samplers TC 7r {x) summarized in Figure 1 generate any word w G C with probability 

™ , 1 \ Tr(w)x n 
F x ,„(w I n) ~ 



The (renormalized) restriction of a 7r-weighted Boltzmann distribution to objects of size n is clearly 
a 7r-wcightcd distribution, and this fact ensures the correctness of a rejection-based approach. 

Let us qualify a context-free language as well-conditioned iff the singular exponent a T of its domi- 
nant singularity is non negative. Following [7], we observe that any grammar can be pointed repeatedly 
until the exponent of its generating function becomes non-negative. Moreover the pointing operator 
leaves a weighted distribution unaffected within the subset of words of a given length. Therefore we 
can restrict our analysis to grammars associated with flat Boltzmann distributions, generate words 
from the pointed grammars and erase the point(s) afterward. 

Theorem 1 (Essentially proven in [7]). Let C n be a weighted well- conditioned context-free lan- 
guage and x n be the root in [0,/?^) of ~R X> ^{N) = n. Then the complexity X e [n\ of the sampler 
r2C(x n ,Tr;n,e) described in Algorithm 2 is such that 

- Ife = (exact size): X e [n] G O ^ r ( a *J n + c ^ n 

— If e > ( approximate- size) : X e [n] £ O f — — — ■ + c(tt) 

V Ca„ (e) 

where n is the cost-per-letter induced by the canonical Boltzmann samplers, is the singular exponent 
of the dominant singularity of C^iz), Q a (e) := — -p- — r- / (l + s) a '*~ 1 e~ a '"( 1+s 'ds, T(x) is the gamma 
function, and c{ir) does not depend on n. 



In particular, for any fixed weight vector 7r, Theorem I implies a 0(n) (resp. 0(n 2 )) complexity 
for the approximate-size (resp. exact size) weighted samplers. By contrast, using weights to enforce 
compositions that are unnatural (e.g. enforcing 0(^/n) occurrences of a letter occurring 0(n) times 
in the uniform distribution) may lead to a - somewhat hidden - dependency of 7r in n. Although we 
were unable to characterize these dependencies and their impact c(tt) on both complexities, we expect 
the latter to remain limited, and conjecture similar complexities when meaningful compositions are 
targeted. For instance, assuming at least one occurrence of each letter (a realistic assumption, since 
prohibition of a letter is simply achieved through a grammar modification), and the frequencies and 
the weights can therefore be assumed to be bounded by some function of n. 

In the case of rational languages, the following theorem provides a computable evaluation for the 
efficiency of the size-rejection process. It relies on the partial fraction expansion of rational functions, 
which can be obtained for any weighted generating function C-„{z), and is denoted by 

r m,i 

c «(*) = EEt 1 - z /pi)~ ai ' h Kk + p(z) (3) 
i=i fe=i 

where P(z) is a polynomial of degree bounded by a constant, r the number of distinct singularities 
and mi the multiplicity of pi which are sorted by increasing module. In weighted generating functions 
the values of /?,, P(z), hi t k, k and r depend on the actual values of the weights. 

Theorem 2. Let be a weighted rational language, x n be the root in [0,p n ) of E X ^(N) = n and 
e > be a tolerance then the approximate- size sampler r2C(x n ,ir;n,e) succeeds after an expected 
number of trials of r , C 7r (x, 6) in 

Civ \<£ri) 

/ r TUi \ 

E E Ci-l'W-^k + [z n ]P(z) (X n )n 
\i=l fc=l / 



5 Complexity of the multidimensional rejection (Phase III) 

Our approach relies on a rejection scheme that generalizes that of the classic - univariate - Boltzmann 
sampling. Words are drawn from a weighted distribution - rejecting those whose frequencies are too 
distant from the targeted one - until an acceptable one is found and returned. This gives the following 
rejection sampler r%A{x, tt; n, m, e, a) for a language A where x is real, tt a real k- vector, m a map 
from N to K fc , and e the tolerance: 

Input: The parameters x, 7r, n, m, e, a 
Output: An object of A of size s in I(n, e) 

and for every parameter 7r,, the number of occurrences of Zi is in 

I(m,i(s),E,a) := [m,(s) - m i (s) cr e,mi(s) + m l (s) a 'e] 

repeat 

7 := r2A(x, 7r; n, e) 
until Vi, S I(mi(s),e,a); 
return (7) 



Algorithm 3: rsA(x, 7r; n, m, e, a) 

In many important classes of combinatorial structures, the composition of a random object is 
concentrated around its mean. It follows that a rejection-based generation can succeed after few 
attempts, provided that the expected composition matches the targeted one. Our main result is 
that, for any irreducible and simple context-free language, a suitably parameterized multidimensional 
rejection sampler generates a word of targeted composition after C(n fe / 2 ) attempts. Moreover, allowing 
a n$ (/? > 1/2) tolerance on the number of occurrences of each letters yields a sampler that succeeds 
in expected number of attempts asymptotically constant. 



Now, let us denote by £/„(7To) the multivariate random variable which follows the probability 

[z n n a }C„(z) 



P(U n (TV Q ) =a) = 



i.e. the distribution of the parameters for objects of size n. Moreover, let us denote by fi(n, 7Tq) the 
mean-vector of U n (7To) and by V(n, ttq) its variance-covariance matrix. If we do not have any strict 
correlation between the parameters, the matrix V(n,TTo) is positive definite (and so, invcrtible). We 
can then associate a norm to each composition vector u through ||«||y-i := \J u T V(n, tvq)^ 1 u. Now, 

let V be a positive definite matrix, we denote by k (V) := inf {||m||v}, the infinum distance 5 

||m||oo=i 

from the unit sphere to the center of the Banach space. 
Definition 1. The c-concentrated condition is defined as : 

limsup (| |/x(ri, 7r)|| 00 )' T • k (V(n, tt) -1 ) = c> y/k/e. 

n— >oo 

Theorem 3 (Approximate composition). Let x n and 7r a be the solution of E, Xy7T (N) = n and 
E Xi7r (iVi) = a{. The map m is defined as the m:sn M s ^ a (Ni) and assume that the a -concentrated 
condition holds for some a < 1. Then the expected number of trials (of r2A(x n , 7r; n, e)) of the rejection 
sampler r^C(x n , Tr a ; n, m, e, a) is upper-bounded by 

SUp 2^ 

sei(n,e) (e-«;(V r (s,7r a )- 1 ) • ||/i(s,7r a )||^) - fc 
which tends to a constant as n — > oo. 



Theorem 4 (Exact composition). Assume that (U n (TT a )) admits a multidimensional Gaussian law 
with mean /x and variance-covariance matrix V proportional to f(n) as limiting distribution when n 
tends to the infinity, then the exact- composition rejection sampler r^C{x n , 7r a ; n, m, 0, 1) succeeds after 
an expected number of trials equal to (27r) fc / 2 (dct(V)) 1 / 2 = O (f(n) k ^ 2 ). 

Proof. Just notice that the probability to draw an exact composition corresponds to take u = fi in 
the asymptotic estimate 



(27r) fc /2(dct(V)) 1/2 V 
Consequently the expected number of attempts is (27r) fe / 2 det(V) 1 / 2 = 0(f(n) k / 2 ). 



5.1 Rational languages: Bender-Richmond- Williamson theorem 

The Bender- Richmond- Williamson theorem [2, Theorem 1] defines sufficient conditions such that 
the limiting distribution of a rational language 72. is a multidimensional Normal distribution. Let 
us remind that a rational language is irreducible if its minimal automaton A is strongly-connected, 
and aperiodic - if the cycle lengths in A have greatest common divisor equal to 1. Additionally the 
periodicity parameter lattice A, defined in [2] (Definition 2) is required to be full dimensional to avoid 
trivial correlations in the occurrences of letters. 

Theorem 5. Let TZ^ be a weighted rational language whose minimal automaton is irreducible and 
aperiodic, and x n be the root in [0,p„) ofE x _ 7r (N) = n. Assume that the periodicity parameter lattice 
A is full dimensional; Then: 

— Ver > 1/2, the approximate-composition sampler r^TZ{x n ,Tv;n, e, a) succeeds after 0(1) trials 

5 Recall that the infinity norm is defined as ||u||oo = max (|ui|, • • • , \v,k\) 



— For a = 1/2, 3sq such that Ve > £o r^R.{x n , tt; n, e, a) succeeds after 0(1) trials 

— The exact- composition rejection sampler r^TZ{x n , 7r; n, 0, 1) succeeds after 0(n k / 2 ) trials. 

Proof. From the system of language equations C = M ■ C + E, we directly obtain the system L = 
zM ■ L + E for the generating function. In this case the Perron-Frobenius theorem ensures that the 
dominating pole of every Li in L is the smallest positive real root of det(I — z ■ M) = and that 
this pole is simple. Now, assume that the periodicity parameter lattice A defined in [2] (Definition 
2) is full dimensional. Assume also that we have a compact set LI\ for the parameters in which the 
singular exponent is constant and equal to 1. Then from the Bender- Richmond- Williamson theorem 
(see [2], Theorem 1 and [1]), it follows that for any fixed parameter in the compact set ZZi, the limiting 
distribution of the parameters is a multidimensional Gaussian distribution with mean and variance- 
covariance matrix proportional to n. Consequently, Theorem 3 applies for a > 1/2, Theorem 4 applies 
with f(n) = n, and the result follows. 

Let us discuss the prerequisites of Theorem 5. If the matrix M is not aperiodic, there exists a 
power d such that M d is aperiodic. So, we can always reduce the problem to a list of d aperiodic ones, 
and Theorem 5 applies under the same assumptions (full dimensional periodicity parameter lattice 
and compact set with constant singular exponents) . The irreducibility requirement may be lifted when 
one of the strongly connected components dominates asymptotically, i.e. when the associated schema 
only involves subcritical and supercritical compositions [9, Theorem IX. 2]. However the case of a 
competition between different components in a non irreducible automaton is much more challenging 
and requires serious developments that cannot be included in this short paper. Finally we point out 
that, with minor modifications, similar results could be obtained for more general transfer matrix 
models. 

5.2 Context-free languages: Drmota theorem 

A theorem of [6] gives very similar sufficient conditions for the limiting multivariate distribution to 
satisfy the conditions of Theorem 4. Namely, the irreducibility condition needs being fulfilled by the 
dependency graph of the grammar - the directed graph on non-terminals whose edges connect left 
hand sides of rules to their associated right-hand sides. The lattice and aperiodicity properties are 
replaced by the very similar concept of simple type grammar, requiring the existence of a positive 
k + 1 dimensional cone centered on in the space of coefficients. 

Theorem 6. Let be a weighted context-free language generated from a grammar Q of simple-type 
[6, Theorem 1] and whose dependency graph is strongly connected. Then the complexities summarized 
in Theorem 5 also hold for . 

Again, the strong-connectedness requirement could be relaxed for disconnected grammars whose 
behavior is dominated by that of a single connected component. A formal characterization of such 
grammars can be interpreted in the theory of (sub/super)-critical compositions [9, Theorem IX. 2]. 

6 Sampling perfect Tetris tesselations 

In this short illustration, we address the generation of Tetris tessellations, i.e. tessellations using 
tetraminoes of a board having prescribed width w. The Tetris game consists in placing falling 
tetraminoes (or pieces) V in a w x h board. The goal of the player is to create hole-free horizontal 
lines which are then eliminated, and the game goes on until some piece stacks past the board ceiling. 
Most implementations of Tetris use the so-called bag strategy, which consists in giving the player 
sequences of permutations of the 7 types of tetraminoes, therefore inducing a uniform composition in 
each tetramino type. A rational specification (Built by Algorithm 4) exists for Tetris tessellations of 
any fixed width, but the additional constraint on composition provably throws the associated language 
out of the context-free class. Therefore, we choose to model the generation of uniformly distributed 
Tetris tessellations as a multivariate generation within a rational language. Such tessellations could 
in turn be used as a basic construct to build hard instances for the offline version of the algorithmic 
Tetris problems studied in [4] and [11]. 



Input: Board width w and flat boundary B w 
Output: Q the states set and a the transition 

function of Am = (V, Q, B w , {B w }, a) 
begin 

(Q,a) <- (B w ,0) 
S^{B W } 
while S / do 

for p G Vb do 

B'^B-p; 
WB'iQ then 
Q^QU{B'}; 

S < ^ = push *5 j 

end 

a^aU{(Z?,p,Z?')}; 
end 
end 

return (Q, a) 
end 
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Algorithm 4: Constructing the automaton A b 
the number of states for increasing values of w. 



for tessellations of width w. Right: Growth of 



6.1 Building the automaton of Tetris tesselations 

First let us find an unambiguous decomposition of Tetris tessellations. The idea is to focus on the state 
of the upper band of the tessellation of height 4, or boundary of a partial tessellation. In particular for 
(complete) Tetris tessellations the upper band is completely filled and the associated boundary is flat. 
One can investigate the different ways to get to a given boundary B by simulating the removal from 
B of a piece p, completing the boundary after each removal so that the highest non-empty position 
stays on the top row. Without further restriction on the position of removal, such a decomposition 
would be ambiguous and give rise to an infinite number of different boundaries. Consequently, we 
enforce a canonical order on the removal of pieces by restricting it to a set of (possibly rotated) pieces 
Vb positioned such that: a) the upper-rightmost position of the piece matches that of the boundary 
and b) the piece is entirely contained in the boundary. We refer to the induced decomposition as the 
disassembly decomposition. 

Proposition 4. The disassembly decomposition generates sequences of removals from and to flat 
boundaries that are in bisection with Tetris tesselations. 

Proof (Sketch). Let us discuss briefly the correctness of this decomposition, or equivalent that the 
sequences of k removals leading from a flat boundary B w to itself are in bijection with the tessellations 
of width w. First let us notice that the decomposition is unambiguous, since all the local removals 
share at least one position (the upper-rightmost of the boundary) and are therefore strongly ordered. 
Furthermore the decomposition is also provably complete by induction on the number n of piece, 
since any tessellation has a upper-rightmost position which, upon removal, gives another tessellation 
of smaller size, and completeness of the decomposition propagates from tessellations of size n to size 
n + 1. Finally, it gives rise to a finite number of states since the difference between the highest and 
lowest point in any reached boundary does not exceed the maximal height of a piece. 

The finitcness of the state space suggests Algorithm 4 that builds the automaton A w , generating 
tessellations of width w. Notice that the resulting automaton in not necessarily co-accessible, since 
the removal of some piece can create boundaries that cannot be completed into a flat one through 
any sequence of removal. Consequently, we added in our implementation a test of connectedness that 
discards any boundary having a (dis)connected component involving a number of blocks that is not 
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Fig. 2. Left: Tetris tessellations associated with a unique instance. Only the most relevant dependency points 
are displayed here (arrows) and pieces are labelled with their rank in the only compatible instance. Duplicating 
the gadget preserves the uniqueness of the associated instance while allowing for the generation of tessellations 
of arbitrarily large dimensions. Right: Tesselation realized by ( h ; 2 ) £ 0(2" / 1 \fn) different instances. 



a multiple of 4, as such boundaries clearly cannot reach a fiat state again. Running a minimization 
algorithm of the resulting automata confirms the expected explosion in the number of states (See 
Algorithm 4) required for increasing values of uu. 

6.2 Random generation 

First we point out that the automaton has matching initial and final states, so the strong connectedness 
is obviously ensured and our theorems regarding the complexity of our generators apply. One can then 
translate the automaton transitions into a system of functional equations involving the (rational) 
generating functions associated with each states. Solving the system gives the generating functions, 
from which one can extract many informations. 

For instance, fixing the width w = 6 and a number n = 105 of pieces, one obtains a number 
^6,105 = 3.10 71 of potential tessellations, and extracting coefficients of suitable derivatives yields: 

Piece □ \L J □ r£? ^7 

Frequency (%) 7.90 10.55 20.42 20.42 17.00 7.90 15.81 

Consequently, the average composition of a Tetris tessellation is incompatible with the bag strategy, 
which induces uniformly distributed pieces. One can then use the results of Section 3 to compute a 
set of weights that ensures 1/7-th proportions in each type of pieces. 

Piece D \L J □ r£? 77 

Weight 0.93 0.84 0.38 0.38 0.46 0.93 0.42 
Frequency (%) 14.3 14.1 14.2 14.2 14.2 14.3 14.5 

A weight random generation for the w = 6 and n = 105, coupled with a rejection that allows the 
numbers of any piece to be equal to 15 ± 1, gives the instances drawn in Figure 3. 

6.3 From random Tetris tessellations to Tetris instances 

Proposition 5. For any Tetris tesselation T, there exists an instance (sequence of pieces) such that 
T can be obtained. 

Proof. Let us assume that T is a tessellation of a w x n rectangle using tetraminoes, and let us call 
dependency point any contact between the southward face of a piece B\ and the northward face of 
a piece Bi- Such points induce dependencies B\ — > B%, which are the arcs of a dependency graph 
D = (T,E). Additionally, each edge is labelled with the coordinate of its associated dependency 
point. 

It can be shown that D is acyclic, by pointing out that any path along D is labelled with coordinates 
that are either increasing on the j/-axis or monotonic on the x-axis. Let us start by noticing that, 
aside from the c Lb and r£P pieces, all types of pieces exhibit northward faces that are strictly higher 



Fig. 3. Fifteen Tetris tesselations of width 6 having uniform composition (+/- 1) in the different pieces. 



than their southward ones. Furthermore, any assembly of distinct pieces exposes northward faces that 
are at greater y-coordinates than their dependency point, inducing an increase of y-coordinate in the 
path. Consequently, there only exists two configurations of dependent pieces A — > B, namely d B P 
and h 5 b , such that B exposes a southward face at the same height as their dependency point. The 
only way for a path in D not to increase in y-coordinate is then to feature a sequence of t£P (resp. ^jh) 
pieces, inducing a monotonic behavior which proves our claim, and the acyclic nature of D follows. 
Finally, the acyclicity of D implies the existence of a sequence of pieces realizing T, since it is always 
possible to removing a piece. 

Let us discuss the limitations induced by Tetris tesselation as a model for Tetris instances. First 
it can be remarked that Tetris tesselations do not capture every possible Tetris game ending with 
an empty board, as one may temporarily leave holes which amount to disconnecting pieces in the 
tesselation representation. Secondly there generally exists different free pieces to choose from while 
rebuilding a tesselation, and therefore different instances can lead to a given tesselation. Furthermore 
the number of instances highly depends on the actual tesselation (from one to an exponential in 
n, as illustrated in Figure 2), Consequently, using the DAGs associated with Tetris histories to draw 
instances for the offline version of Tetris algorithmic problems, studied in [4], would favor exponentially 
certain instances over others, and the uniform random generation of instances ensuring feasibility of 
a perfect Tetris game remains a challenging problem. 

7 Conclusion 

In this paper, we adapted and applied a general methodology for the multivariate random generation 
of combinatorial objects. Under explicit and natural conditions, random generators having complexity 
in C(n 2+fc / 2 ) were derived for the exact size and composition generation, outperforming best known 
algorithms (in 0(n k ) and 0(n 2k ) respectively for rational and context-free languages) for this problem. 
Furthermore, provided a small (linear) tolerance is allowed on the size of generated objects, and a 
fi(y/n) one is allowed in the other dimensions, our generators generate objects in linear expected time. 



We applied these principles to the generation of perfect Tetris tessellations with uniform statistic in 
tctraminoes and discussed the generation of Tetris games from this model. 

This paper is the first step toward a general analysis of the multi-parameters Boltzmann sampling. 
Compared to its alternative using the recursive method, the resulting method is not only theoretically 
faster, but also only requires 0(n) storage and its time complexity seems less affected by larger 
specifications. Nevertheless, many questions are left open, for instance with respect to the nature of 
the dependency between the weights and reasonable frequencies, which would allow us to address the 
complexities of Phase 2 in a much more general setting. Furthermore the success of our programme 
critically depends on the existence of suitable weights, which is not guaranteed, e.g. when the targeted 
distribution is incompatible with some dependencies induce by the grammar. A future direction of 
this work might investigate non-trivial, sufficient - yet tight - conditions such that the targeted 
composition can be achieved on the average. 

Since multivariate Boltzmann samplers can be obtained in any situation where the distribution is 
well-concentrated, one may envision extensions to other classes, including constrained trees, permu- 
tations with a fixed number of cycles, functional graphs with a controlled number of components. . . A 
first extension may consider simple Polya operators and extend some of the multivariate theorems 
established in the present work. The requirement of strong-connectedness (or irreducibility) could 
be questioned or categorized using (sub/super)-critical compositions. Another direction is the use of 
Hwang's Quasi-powers theorem, giving rise to low variance distributions, for a general treatment of 
the bivariate case. 
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